Tango.rs

It used to be that benchmarking required a significant amount of time and numerous iterations to arrive at meaningful results, which was particularly arduous when trying to detect subtle changes, such as those within the range of a few percentage points.

Introducing Tango.rs, a novel benchmarking framework that employs paired benchmarking to assess code performance. This approach capitalizes on the fact that it's far more efficient to measure the performance difference between two simultaneously executing functions compared to two functions executed consecutively.

Features:

very high sensitivity to changes which allows to converge on results quicker than traditional (pointwise) approach. Often the fraction of a second is enough;
ability to compare different versions of the same code from different VCS commits (A/B-benchmarking);
async support using tokio.rs;
macOS, Linux and Windows support;

1 second, 1 percent, 1 error

Compared to traditional pointwise benchmarking, paired benchmarking is significantly more sensitive to changes. This heightened sensitivity enables the early detection of statistically significant performance variations.

Tango is designed to have the capability to detect a 1% change in performance within just 1 second in at least 9 out of 10 test runs.

Prerequirements

Rust and Cargo toolchain installed (Rust stable is supported on Linux/macOS, nightly is required for Windows)
(Optional) cargo-export installed

Getting started

Add cargo dependency and create new benchmark:

[dev-dependencies]
tango-bench = "0.5"

[[bench]]
name = "factorial"
harness = false

allows rustc to export symbols for dynamic linking from benchmarks
- (Linux/macOS) Add build script (build.rs) with following content
```
fn main() {
    println!("cargo:rustc-link-arg-benches=-rdynamic");
    println!("cargo:rerun-if-changed=build.rs");
}
```
- (Windows, nightly required) Add following code to cargo config (.cargo/config)
```
[build]
rustflags = ["-Zexport-executable-symbols"]
```

Add benches/factorial.rs with the following content:

use std::hint::black_box;
use tango_bench::{benchmark_fn, tango_benchmarks, tango_main, IntoBenchmarks};

pub fn factorial(mut n: usize) -> usize {
    let mut result = 1usize;
    while n > 0 {
        result = result.wrapping_mul(black_box(n));
        n -= 1;
    }
    result
}

fn factorial_benchmarks() -> impl IntoBenchmarks {
    [
        benchmark_fn("factorial", |b| b.iter(|| factorial(500))),
    ]
}

tango_benchmarks!(factorial_benchmarks());
tango_main!();

Build and export benchmark to target/benchmarks directory:

$ cargo export target/benchmarks -- bench --bench=factorial

Now lets try to modify factorial.rs and make factorial faster :)

fn factorial_benchmarks() -> impl IntoBenchmarks {
    [
        benchmark_fn("factorial", |b| b.iter(|| factorial(495))),
    ]
}

Now we can compare new version with already built one:

$ cargo bench -q --bench=factorial -- compare target/benchmarks/factorial
factorial             [ 375.5 ns ... 369.0 ns ]      -1.58%*

The result shows that indeed there is indeed ~1% difference between factorial(500) and factorial(495).

Additional examples are available in examples directory.

Async support

To use Tango.rs in an asynchronous setup, follow these steps:

Add tokio and tango-bench dependencies to your Cargo.toml:

[dev-dependencies]
tango-bench = { version = "0.5", features = ["async-tokio"] }

[[bench]]
name = "async_factorial"
harness = false

Create benches/async_factorial.rs with the following content:

use std::hint::black_box;
use tango_bench::{
    async_benchmark_fn, asynchronous::tokio::TokioRuntime, tango_benchmarks, tango_main,
    IntoBenchmarks,
};

pub async fn factorial(mut n: usize) -> usize {
    let mut result = 1usize;
    while n > 0 {
        result = result.wrapping_mul(black_box(n));
        n -= 1;
    }
    result
}

fn benchmarks() -> impl IntoBenchmarks {
    [async_benchmark_fn("async_factorial", TokioRuntime, |b| {
        b.iter(|| async { factorial(500).await })
    })]
}

tango_benchmarks!(benchmarks());
tango_main!();

Build and use benchmarks as you do in synchronous case
```
$ cargo bench -q --bench=async_factorial -- compare
```

Runner arguments

There are several arguments you can pass to the compare command to change it behavior

-t, --time – how long to run each benchmark (in seconds)
-s, --samples – how much samples to gather from each benchmark
-f – filter benchmarks by name. Glob patterns are supported (eg. */bench_name/{2,4,8}/**)
-d [path] – dump CSV with raw samples in a given directory
--gnuplot – generate plot for each benchmark (requires gnuplot to be installed)
-o, --filter-outliers – additionally filter outliers
-p, --parallel - run base/candidate functions in 2 different threads instead of interleaving in a single thread
--fail-threshold – do fail if new version is slower than baseline on a given percentage
--fail-fast - do fail after first benchmark exceeding fail threshold, not after the whole suite

Contributing

The project is in its early stages so any help will be appreciated. Here are some ideas you might find interesting

find a way to provide a more user friendly API for registering functions in the system
if you're a library author, trying out tango and providing feedback will be very useful